-
Notifications
You must be signed in to change notification settings - Fork 11
Add instrumentation and expose measurement in prometheus format #210
base: master
Are you sure you want to change the base?
Add instrumentation and expose measurement in prometheus format #210
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I noticed we use a lot of space indentation in Makefile, though I think only tab indentation should be used in Makefile. Will use separate PR to change that indentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add at least one actual metric to this PR? I understand that it'll make the PR larger, but it would also make it easier to see what the abstractions are and how we plan to track metrics in the client code. Also note that it's not easy to understand what "exposition" means in this context.
src/api_server.cc
Outdated
conn->write(response); | ||
} | ||
|
||
std::string MetadataApiServer::SerializeMetricsToPrometheusTextFormat() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest creating a Telemetry
class to wrap this whole Collectable
business and the registry and text serialization. Then you could have a field telemetry_
of type const Telemetry&
, and this method can be replaced with telemetry_.MetricsToString()
(or even telemetry_.ToString()
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the current implementation, it is implemented as function inside google::internal
namespace, https://github.com/Stackdriver/metadata-agent/pull/210/files#diff-bcca2b17db5d9ee49a78774f858b95e4R25 . Two reasons:
- It can be implemented as a function, instead of a class
- implement the function inside
google::internal
namespace enables us to test this function, but does not need to keep an membertelemetry_
inside theMetadataApiServer
class
@g-easy can you take a brief look on this PR? I want to get your feedback on
|
As we add more metrics, some of the metrics are other objects - e.g. KubernetesUpdater. View registration should happen after all the metrics are accessed. The lifecycle of all the measure are registered outside of MetadataAgent, so move view registration to the last step in main function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build system stuff looks reasonable.
Test looks great.
Couple of comments on initialization.
src/internal/measures_utils.cc
Outdated
|
||
namespace { | ||
|
||
::prometheus::TextSerializer text_serializer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are text_serializer and exporter trivially constructible? Maybe make them both static pointers inside SerializeMetricsToPrometheusTextFormat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even create them every time SerializeMetricsToPrometheusTextFormat
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now creating objects every time when SerializeMetricsToPrometheusTextFormat
is called, because I cannot make sure whether my implementation on static pointer is the right approach, so eventually switch to object approach.
As for now, SerializeMetricsToPrometheusTextFormat()
will only be called when server is called on /metrics
specifically, so I don't think it will have huge performance impact and don't want to optimize early without my solid knowledge on static pointer approach.
change from RFC to normal one, because this one has tests included. high level review is still welcome first so we can get feedback loop soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High-level comments. I did not look at the Makefile
changes yet — those will take more effort to review.
[submodule "lib/prometheus-cpp"] | ||
path = lib/prometheus-cpp | ||
url = https://github.com/jupp0r/prometheus-cpp | ||
[submodule "lib/opencensus-cpp"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always wary of targeting an untagged and unreleased commit. We can fix this later, but just wanted to note this as a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
acknowledged. I just follow the pattern we used on other submodules. Once opencensus-cpp offically release their new version, we should check in that specific version. for now, I targeted on a commit which includes all the functionality we need, including prometheus-exporter.
src/internal/measures_utils.cc
Outdated
|
||
namespace { | ||
|
||
::prometheus::TextSerializer text_serializer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or even create them every time SerializeMetricsToPrometheusTextFormat
is called.
…o not know they will match google style guide.
|
||
// View Descriptor accessors. If the view descriptor variable is not | ||
// initialized, these methods will initialize the variable. | ||
static const ::opencensus::stats::ViewDescriptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. I tried to implement GetGceApiRequestErrorsCumulativeViewIntData
, but it the data cannot propogated to this view because 1. the view needs to be created before the metric is recorded, and 2. for testing, we need to call TestUtils::Flush()
to propagate to view. If this functino is private, it cannot be used in oauth2_unittest
. We may put it as private function and make a class in oauth2_unittest
as friend class, but if we add more metrics across files, you will friend a couple more - so I am not sure that's what we want.
In order to create a view from the view descriptor, and run TestUtils::Flush()
, I put this function as public function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment about ::opencensus::stats::StatsExporter::GetViewData()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the standard trick for accessing private functions in tests is to mark them protected
in the implementation class, and add a subclass in the test that makes them public
. This can be made to work on non-virtual and static functions as well.
|
||
// View Descriptor accessors. If the view descriptor variable is not | ||
// initialized, these methods will initialize the variable. | ||
static const ::opencensus::stats::ViewDescriptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment about ::opencensus::stats::StatsExporter::GetViewData()
.
.set_measure(measure_name) | ||
.set_aggregation(opencensus::stats::Aggregation::Count()); | ||
// remove view and flush the result before the actual test. | ||
::opencensus::stats::StatsExporter::RemoveView(view_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we clean up in line 57, why do we also need to clean up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the test self-contained, you need to avoid other views with view_name
registered. In current unittest there is only one view called test_view
, so it shouldn't be an issue; however if other view registered the view_name already, the data of the view will be aggregated on top of the original view.
The view on line 57 is to be a good citizen to clean up the view, so other test cases won't be affected by this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can we make the view name unique to this test, then (e.g., by using __FILE__
and __LINE__
)? It seems weird that the first thing we do is unregister a newly-created view.
test/oauth2_unittest.cc
Outdated
|
||
auth.GetAuthHeaderValue(); | ||
|
||
// Flush records to propagate to views. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment belongs on the next statement, doesn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flush records to propagate to the view we first initialize in line 148 - this comment explains the next two lines.
"test_view_1 1.000000 [0-9]*\n")); | ||
|
||
// clean up the view which we registered for export. | ||
::opencensus::stats::StatsExporter::RemoveView(view_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse the Finally
class from oauth2.cc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but won't do it because it's in another .cc file - in order to use it, we need to refactor, but I don't think it's good idea to add much complexity to this already complex PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather pay the cost of 10 lines of code duplication for a proper RAII pattern than risk the flakiness resulting from forgetting to clean this up in the future.
::opencensus::stats::StatsExporter::RemoveView(::google::Metrics::kGceApiRequestErrors); | ||
::opencensus::stats::testing::TestUtils::Flush(); | ||
|
||
::google::Metrics::RecordGceApiRequestErrors(1, "test_kind"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? If you've removed the view from StatsExporter
and never re-registered it, how are those values getting exported into SerializeMetricsToPrometheusTextFormat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand - I think you suggested that when we register the measurement
internally, we also register the view
for it?
For test, to make this testcase not affected by other test, we need to remove the view first (first 2 lines) and let the logic inside RecordGceApiRequestErrors
register the view and record the measurement, so later the value is exported by SerializeMetricsToPrometheusTextFormat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did suggest that, but the fact that we remove the view and expect it to be re-registered threw me off a bit in this test. A comment to remind readers of this setup would be helpful. In fact, because the result of GceApiRequestErrorsInitialize
is cached, would the view even be re-registered at all?
As for having to unregister the view in the first place, this strikes me as pretty fragile, and, as I said in another comment, gives me no confidence that we're testing what we're using in production. I can see a couple of ways of addressing this, the most sensible being to retrieve the value for this metric using ::opencensus::stats::StatsExporter::GetViewData()
[1] before calling the method under test and do the same after calling the method under test, and check that the value changed in the expected way. We could then change this specific test to make sure that if we touch the GceApiRequestErrors
counter, it appears in the prometheus output with some value. Does this make sense?
[1] I wish opencensus-cpp
provided a way to retrieve the counter value by view name directly from the StatsExporter
.
test/oauth2_unittest.cc
Outdated
|
||
// Remove existing view on kGceApiRequestErrors to avoid | ||
// other tests affecting the result. | ||
// Flush to propagate existing records to views, including records from other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this comment belong on the next statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why it cannot be grouped into a section of comments?
TEST(SerializeToPrometheusTextTest, SingleOpencensusView) { | ||
const char measure_name[] = "test_measure"; | ||
const char view_name[] = "test_view"; | ||
const ::opencensus::stats::MeasureInt64 test_measure = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed function could be reused verbatim to implement GceApiRequestErrors
(instead of GceApiRequestErrorsInitialize
)... Basically, I'm concerned about the amount of setup these tests are doing, and it seems like this is fragile in that it could easily diverge from the setup in production code. The closer we can get this to production code, the more confidence we would have that we're testing what we're using.
"test_view_1 1.000000 [0-9]*\n")); | ||
|
||
// clean up the view which we registered for export. | ||
::opencensus::stats::StatsExporter::RemoveView(view_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather pay the cost of 10 lines of code duplication for a proper RAII pattern than risk the flakiness resulting from forgetting to clean this up in the future.
.set_measure(measure_name) | ||
.set_aggregation(opencensus::stats::Aggregation::Count()); | ||
// remove view and flush the result before the actual test. | ||
::opencensus::stats::StatsExporter::RemoveView(view_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, can we make the view name unique to this test, then (e.g., by using __FILE__
and __LINE__
)? It seems weird that the first thing we do is unregister a newly-created view.
::opencensus::stats::StatsExporter::RemoveView(::google::Metrics::kGceApiRequestErrors); | ||
::opencensus::stats::testing::TestUtils::Flush(); | ||
|
||
::google::Metrics::RecordGceApiRequestErrors(1, "test_kind"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I did suggest that, but the fact that we remove the view and expect it to be re-registered threw me off a bit in this test. A comment to remind readers of this setup would be helpful. In fact, because the result of GceApiRequestErrorsInitialize
is cached, would the view even be re-registered at all?
As for having to unregister the view in the first place, this strikes me as pretty fragile, and, as I said in another comment, gives me no confidence that we're testing what we're using in production. I can see a couple of ways of addressing this, the most sensible being to retrieve the value for this metric using ::opencensus::stats::StatsExporter::GetViewData()
[1] before calling the method under test and do the same after calling the method under test, and check that the value changed in the expected way. We could then change this specific test to make sure that if we touch the GceApiRequestErrors
counter, it appears in the prometheus output with some value. Does this make sense?
[1] I wish opencensus-cpp
provided a way to retrieve the counter value by view name directly from the StatsExporter
.
std::string(test_info_->name()) + "_creds.json", | ||
"{\"client_email\":\"user@example.com\",\"private_key\":\"some_key\"}"); | ||
Configuration config(std::istringstream( | ||
"CredentialsFile: '" + credentials_file.FullPath().native() + "'\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should this be a 2-space indent?
// affecting the result, then flush to propagate existing records | ||
// to views, including records from other tests. | ||
::opencensus::stats::StatsExporter::RemoveView( | ||
::google::Metrics::kGceApiRequestErrors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're inside the google
namespace here. Just Metrics::kGceApiRequestErrors
should do.
// StatsExporter::AddView always add a new view and destroy the old one, | ||
// which is not suitable for our | ||
// Metrics::GceApiRequestErrorsCumulativeViewDescriptor(). | ||
::google::Metrics::GceApiRequestErrorsCumulativeViewDescriptor(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're inside the google
namespace here. Just Metrics::GceApiRequestErrorsCumulativeViewDescriptor()
should do.
::testing::Contains(::testing::Pair( | ||
Metrics::GceApiRequestErrorsCumulativeViewDescriptor(), | ||
::testing::Property( | ||
&::opencensus::stats::ViewData::int_data, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pull this out into Metrics::GetInt64Data(const std::string& name)
? That way this test could become:
EXPECT_THAT(Metrics::GetInt64Data(Metrics::kGceApiRequestErrors),
::testing::UnorderedElementsAre(
::testing::Pair(::testing::ElementsAre("oauth2"), 1)));
|
||
// View Descriptor accessors. If the view descriptor variable is not | ||
// initialized, these methods will initialize the variable. | ||
static const ::opencensus::stats::ViewDescriptor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that the standard trick for accessing private functions in tests is to mark them protected
in the implementation class, and add a subclass in the test that makes them public
. This can be made to work on non-virtual and static functions as well.
public: | ||
OAuth2(const Environment& environment) | ||
: OAuth2(environment, ExpirationImpl<std::chrono::system_clock>::New()) {} | ||
: OAuth2(environment, ExpirationImpl<std::chrono::system_clock>::New()) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we change the indentation here? Should we just revert this file altogether?
Add interface inside api_server, so we expose instrumentation metrics later on (currently working on it).
Some thoughts:
MetadataApiServer::SerializeMetricsToPrometheusTextFormat()
so we can test the functionality without considering http_connection, which isn't tested in current test suites.